Data exploration with learning metrics

نویسنده

  • Jaakko Peltonen
چکیده

A crucial problem in exploratory analysis of data is that it is difficult for computational methods to focus on interesting aspects of data. Traditional methods of unsupervised learning cannot differentiate between interesting and noninteresting variation, and hence may model, visualize, or cluster parts of data that are not interesting to the analyst. This wastes the computational power of the methods and may mislead the analyst. In this thesis, a principle called “learning metrics” is used to develop visualization and clustering methods that automatically focus on the interesting aspects, based on auxiliary labels supplied with the data samples. The principle yields non-Euclidean (Riemannian) metrics that are data-driven, widely applicable, versatile, invariant to many transformations, and in part invariant to noise. Learning metric methods are introduced for five tasks: nonlinear visualization by Self-Organizing Maps and Multidimensional Scaling, linear projection, and clustering of discrete data and multinomial distributions. The resulting methods either explicitly estimate distances in the Riemannian metric, or optimize a tailored cost function which is implicitly related to such a metric. The methods have rigorous theoretical relationships to information geometry and probabilistic modeling, and are empirically shown to yield good practical results in exploratory and information retrieval tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Region Directed Diffusion in Sensor Network Using Learning Automata:RDDLA

One of the main challenges in wireless sensor network is energy problem and life cycle of nodes in networks. Several methods can be used for increasing life cycle of nodes. One of these methods is load balancing in nodes while transmitting data from source to destination. Directed diffusion algorithm is one of declared methods in wireless sensor networks which is data-oriented algorithm. Direct...

متن کامل

Region Directed Diffusion in Sensor Network Using Learning Automata:RDDLA

One of the main challenges in wireless sensor network is energy problem and life cycle of nodes in networks. Several methods can be used for increasing life cycle of nodes. One of these methods is load balancing in nodes while transmitting data from source to destination. Directed diffusion algorithm is one of declared methods in wireless sensor networks which is data-oriented algorithm. Direct...

متن کامل

Investigating the Impact of Organizational Learning and Marketing Metrics on the Performance of Marketing (Case Study: Elon Plast Company)

The aim of this study was to analyze the impact of organizational learning and marketing metrics on the marketing performance in the Elon Plast Company of Kermanshah province. It is a functional purpose study with descriptive – survey method. The statistical population includes 100 employees of Elon Plast Company in Kermanshah province. A sample of 80 people was chosen using Cochran formula. Da...

متن کامل

Exploration of Arak Medical Students’ Experiences on Effective Factors in Active Learning: A Qualitative Research

Introduction:: Medical students should use active learning to improve their daily duties and medical services. The goal of this study is exploring medical students’ experiences on effective factors in active learning. Methods: This qualitative study was conducted through content Analysis method in Arak University of Medical Sciences. Data were collected via interviews. The study started with p...

متن کامل

Structuration de bases multimédia pour une exploration visuelle. (Structuring multimedia bases for visual exploration)

The large increase in multimedia data volume requires the development of effective solutions for visual exploration of multimedia databases. After reviewing the visualization process involved, we emphasis the need of data structuration. The main objective of this thesis is to propose and study clustering and classification of multimedia database for their visual exploration. We begin with a sta...

متن کامل

From learning metrics towards dependency exploration

We have recently introduced new kinds of data fusion techniques, where the goal is to find what is shared by data sets, instead of modeling all variation in data. They extend our earlier works on learning of distance metrics, discriminative clustering, and other supervised statistical data mining methods. In the new methods the supervision is symmetric, which translates to mining of dependencie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004